Multi-Aperture Depth Map Using Blur Kernels and Down-Sampling

ABSTRACT

Embodiments relate to different methods for reducing computations used to estimate depth information. One aspect relates to using down-sampled blur kernels. Another aspect relates to processing of edges in the images. Yet another aspect relates to using partial blur kernels, such as single-sided blur kernels. Yet another aspect relates to frequency filtering to reduce energy and noise at frequencies that do not distinguish between different blur kernels.

CROSS-REFERENCE TO RELATED APPLICATION(S)

This application claims priority under 35 U.S.C. §119(e) to U.S.Provisional Patent Application Ser. No. 62/121,203, “Dual-Aperture DepthMap Using Adaptive PSF Sizing,” filed Feb. 26, 2015. The subject matterof all of the foregoing is incorporated herein by reference in itsentirety.

BACKGROUND

1. Field of the Invention

This invention relates to a multi-aperture imaging system that usesmultiple apertures of different f-numbers to estimate depth of anobject.

2. Description of Related Art

A dual-aperture camera has two apertures. A narrow aperture, typicallyat one spectral range such as infrared (IR), produces relatively sharpimages over a long depth of focus. A wider aperture, typically atanother spectral range such as RGB, produces sometimes blurred imagesfor out of focus objects. The pairs of images captured using the twodifferent apertures can be processed to generate distance information ofan object, for example as described in U.S. patent application Ser. No.13/579,568, which is incorporated herein by reference. However,conventional processing methods can be computationally expensive.

Therefore, there is a need to improve approaches for depth mapgeneration.

SUMMARY

Embodiments relate to different methods for reducing computations usedto estimate depth information. One aspect relates to scaling the size ofblur kernels used in the depth processing. The distance range is dividedinto sub-ranges. A bank of blur kernels is used for each sub-range toestimate distance. For different sub-ranges, the blur kernels andcaptured images are down-sampled by different factors. In this way,although the original blur kernels may span a large range of sizes, thedown-sampled blur kernels will be more limited in size which reducescomputation.

In another aspect, processing of images takes advantage of edges in theimages. The same edge in different images may first be normalized tophase match and/or equate energies in the edges of the two images. Inanother aspect, the edges may be binarized. Binarized edges can be usedto reduce computationally expensive convolutions into simpler summingoperations.

In another aspect, rather than using full blur kernels, only partialblur kernels are used. For example, single-sided blur kernels may beused in order to accommodate edges caused by occlusions, where the twosides of the edge are at different depths.

In yet another aspect, frequency filtering is used to reduce energy andnoise at frequencies that are not useful to distinguish betweendifferent blur kernels.

Other aspects include components, devices, systems, improvements,methods, processes, applications, computer readable mediums, and othertechnologies related to any of the above.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the disclosure have other advantages and features whichwill be more readily apparent from the following detailed descriptionand the appended claims, when taken in conjunction with the accompanyingdrawings, in which:

FIG. 1 is a block diagram of a multi-aperture, shared sensor imagingsystem according to one embodiment of the invention.

FIG. 2A is a graph illustrating the spectral responses of a digitalcamera.

FIG. 2B is a graph illustrating the spectral sensitivity of silicon.

FIGS. 3A-3C depict operation of a multi-aperture imaging systemaccording to one embodiment of the invention.

FIGS. 3D-3E depict operation of an adjustable multi-aperture imagingsystem according to one embodiment of the invention.

FIG. 4 is a plot of the blur spot sizes B_(vis) and B_(ir) of visibleand infrared images, as a function of object distance s.

FIG. 5 is a table of blur spot and blur kernel as a function of objectdistance s.

FIG. 6A is a diagram illustrating one approach to estimating objectdistance s.

FIG. 6B is a graph of error e as a function of kernel number k for thearchitecture of FIG. 6A.

FIG. 7A is a diagram illustrating another approach to estimating objectdistance s.

FIGS. 7B-7D are graphs of error e as a function of kernel number k forthe architecture of FIG. 7A.

FIG. 8 is a diagram illustrating normalization of edges.

FIGS. 9A-9E illustrate a simplified approach for convolution ofbinarized edges.

FIG. 10 is a diagram illustrating the effect of occlusion.

FIG. 11 is a diagram illustrating a set of single-sided blur kernelswith different edge orientations.

FIG. 12 is a frequency diagram illustrating the effect of frequencyfiltering.

The figures depict various embodiments for purposes of illustrationonly. One skilled in the art will readily recognize from the followingdiscussion that alternative embodiments of the structures and methodsillustrated herein may be employed without departing from the principlesdescribed herein.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

FIG. 1 is a block diagram of a multi-aperture, shared sensor imagingsystem 100 according to one embodiment of the invention. The imagingsystem may be part of a digital camera or integrated in a mobile phone,a webcam, a biometric sensor, image scanner or any other multimediadevice requiring image-capturing functionality. The system depicted inFIG. 1 includes imaging optics 110 (e.g., a lens and/or mirror system),a multi-aperture system 120 and an image sensor 130. The imaging optics110 images objects 150 from a scene onto the image sensor. In FIG. 1,the object 150 is in focus, so that the corresponding image 160 islocated at the plane of the sensor 130. As described below, this willnot always be the case. Objects that are located at other depths will beout of focus at the image sensor 130.

The multi-aperture system 120 includes at least two apertures, shown inFIG. 1 as apertures 122 and 124. In this example, aperture 122 is theaperture that limits the propagation of visible light, and aperture 124limits the propagation of infrared or other non-visible light. In thisexample, the two apertures 122, 124 are placed together but they couldalso be separated. This type of multi-aperture system 120 may beimplemented by wavelength-selective optical components, such aswavelength filters. As used in this disclosure, terms such as “light”“optics” and “optical” are not meant to be limited to the visible partof the electromagnetic spectrum but to also include other parts of theelectromagnetic spectrum where imaging may occur, including wavelengthsthat are shorter than visible (e.g., ultraviolet) and wavelengths thatare longer than visible (e.g., infrared).

The sensor 130 detects both the visible image corresponding to aperture122 and the infrared image corresponding to aperture 124. In effect,there are two imaging systems that share a single sensor array 130: avisible imaging system using optics 110, aperture 122 and sensor 130;and an infrared imaging system using optics 110, aperture 124 and sensor130. The imaging optics 110 in this example is fully shared by the twoimaging systems, but this is not required. In addition, the two imagingsystems do not have to be visible and infrared. They could be otherspectral combinations: red and green, or infrared and white (i.e.,visible but without color), for example.

The exposure of the image sensor 130 to electromagnetic radiation istypically controlled by a shutter 170 and the apertures of themulti-aperture system 120. When the shutter 170 is opened, the aperturesystem controls the amount of light and the degree of collimation of thelight exposing the image sensor 130. The shutter 170 may be a mechanicalshutter or, alternatively, the shutter may be an electronic shutterintegrated in the image sensor. The image sensor 130 typically includesrows and columns of photosensitive sites (pixels) forming a twodimensional pixel array. The image sensor may be a CMOS (complementarymetal oxide semiconductor) active pixel sensor or a CCD (charge coupleddevice) image sensor. Alternatively, the image sensor may relate toother Si (e.g. a-Si), III-V (e.g. GaAs) or conductive polymer basedimage sensor structures.

When the light is projected by the imaging optics 110 onto the imagesensor 130, each pixel produces an electrical signal, which isindicative of the electromagnetic radiation (energy) incident on thatpixel. In order to obtain color information and to separate the colorcomponents of an image which is projected onto the imaging plane of theimage sensor, typically a color filter array 132 is interposed betweenthe imaging optics 110 and the image sensor 130. The color filter array132 may be integrated with the image sensor 130 such that each pixel ofthe image sensor has a corresponding pixel filter. Each color filter isadapted to pass light of a predetermined color band onto the pixel.Usually a combination of red, green and blue (RGB) filters is used.However other filter schemes are also possible, e.g. CYGM (cyan, yellow,green, magenta), RGBE (red, green, blue, emerald), etc. Alternately, theimage sensor may have a stacked design where red, green and blue sensorelements are stacked on top of each other rather than relying onindividual pixel filters.

Each pixel of the exposed image sensor 130 produces an electrical signalproportional to the electromagnetic radiation passed through the colorfilter 132 associated with the pixel. The array of pixels thus generatesimage data (a frame) representing the spatial distribution of theelectromagnetic energy (radiation) passed through the color filter array132. The signals received from the pixels may be amplified using one ormore on-chip amplifiers. In one embodiment, each color channel of theimage sensor may be amplified using a separate amplifier, therebyallowing to separately control the ISO speed for different colors.

Further, pixel signals may be sampled, quantized and transformed intowords of a digital format using one or more analog to digital (A/D)converters 140, which may be integrated on the chip of the image sensor130. The digitized image data are processed by a processor 180, such asa digital signal processor (DSP) coupled to the image sensor, which isconfigured to perform well known signal processing functions such asinterpolation, filtering, white balance, brightness correction, and/ordata compression techniques (e.g. MPEG or JPEG type techniques).

The processor 180 may include signal processing functions 184 forobtaining depth information associated with an image captured by themulti-aperture imaging system. These signal processing functions mayprovide a multi-aperture imaging system with extended imagingfunctionality including variable depth of focus, focus control andstereoscopic 3D image viewing capabilities. The details and theadvantages associated with these signal processing functions will bediscussed hereunder in more detail.

The processor 180 may also be coupled to additional compute resources,such as additional processors, storage memory for storing capturedimages and program memory for storing software programs. A controller190 may also be used to control and coordinate operation of thecomponents in imaging system 100. Functions described as performed bythe processor 180 may instead be allocated among the processor 180, thecontroller 190 and additional compute resources.

As described above, the sensitivity of the imaging system 100 isextended by using infrared imaging functionality. To that end, theimaging optics 110 may be configured to allow both visible light andinfrared light or at least part of the infrared spectrum to enter theimaging system. Filters located at the entrance aperture of the imagingoptics 110 are configured to allow at least part of the infraredspectrum to enter the imaging system. In particular, imaging system 100typically would not use infrared blocking filters, usually referred toas hot-mirror filters, which are used in conventional color imagingcameras for blocking infrared light from entering the camera. Hence, thelight entering the multi-aperture imaging system may include bothvisible light and infrared light, thereby allowing extension of thephoto-response of the image sensor to the infrared spectrum. In caseswhere the multi-aperture imaging system is based on spectralcombinations other than visible and infrared, corresponding wavelengthfilters would be used.

FIGS. 2A and 2B are graphs showing the spectral responses of a digitalcamera. In FIG. 2A, curve 202 represents a typical color response of adigital camera without an infrared blocking filter (hot mirror filter).As can be seen, some infrared light passes through the color pixelfilters. FIG. 2A shows the photo-responses of a conventional blue pixelfilter 204, green pixel filter 206 and red pixel filter 208. The colorpixel filters, in particular the red pixel filter, may transmit infraredlight so that a part of the pixel signal may be attributed to theinfrared. FIG. 2B depicts the response 220 of silicon (i.e. the mainsemiconductor component of an image sensor used in digital cameras). Thesensitivity of a silicon image sensor to infrared radiation isapproximately four times higher than its sensitivity to visible light.

In order to take advantage of the spectral sensitivity provided by theimage sensor as illustrated by FIGS. 2A and 2B, the image sensor 130 inthe imaging system in FIG. 1 may be a conventional image sensor. In aconventional RGB sensor, the infrared light is mainly sensed by the redpixels. In that case, the DSP 180 may process the red pixel signals inorder to extract the low-noise infrared information. Alternatively, theimage sensor may be especially configured for imaging at least part ofthe infrared spectrum. The image sensor may include, for example, one ormore infrared (I) pixels in addition to the color pixels, therebyallowing the image sensor to produce a RGB color image and a relativelylow-noise infrared image.

An infrared pixel may be realized by covering a pixel with a filtermaterial, which substantially blocks visible light and substantiallytransmits infrared light, preferably infrared light within the range ofapproximately 700 through 1100 nm. The infrared transmissive pixelfilter may be provided in an infrared/color filter array (ICFA) may berealized using well known filter materials having a high transmittancefor wavelengths in the infrared band of the spectrum, for example ablack polyimide material sold by Brewer Science under the trademark“DARC 400”.

Such filters are described in more detail in US2009/0159799, “Colorinfrared light sensor, camera and method for capturing images,” which isincorporated herein by reference. In one design, an ICFA contain blocksof pixels, e.g. a block of 2×2 pixels, where each block comprises a red,green, blue and infrared pixel. When exposed, such an ICFA image sensorproduces a raw mosaic image that includes both RGB color information andinfrared information. After processing the raw mosaic image, a RGB colorimage and an infrared image may be obtained. The sensitivity of such anICFA image sensor to infrared light may be increased by increasing thenumber of infrared pixels in a block. In one configuration (not shown),the image sensor filter array uses blocks of sixteen pixels, with fourcolor pixels (RGGB) and twelve infrared pixels.

Instead of an ICFA image sensor (where color pixels are implemented byusing color filters for individual sensor pixels), in a differentapproach, the image sensor 130 may use an architecture where eachphoto-site includes a number of stacked photodiodes. Preferably, thestack contains four stacked photodiodes responsive to the primary colorsRGB and infrared, respectively. These stacked photodiodes may beintegrated into the silicon substrate of the image sensor.

The multi-aperture system, e.g. a multi-aperture diaphragm, may be usedto improve the depth of field (DOF) or other depth aspects of thecamera. The DOF determines the range of distances from the camera thatare in focus when the image is captured. Within this range the object isacceptably sharp. For moderate to large distances and a given imageformat, DOF is determined by the focal length of the imaging optics N,the f-number associated with the lens opening (the aperture), and/or theobject-to-camera distance s. The wider the aperture (the more lightreceived) the more limited the DOF. DOF aspects of a multi-apertureimaging system are illustrated in FIG. 3.

Consider first FIG. 3B, which shows the imaging of an object 150 ontothe image sensor 330. Visible and infrared light may enter the imagingsystem via the multi-aperture system 320. In one embodiment, themulti-aperture system 320 may be a filter-coated transparent substrate.One filter coating 324 may have a central circular hole of diameter D1.The filter coating 324 transmits visible light and reflects and/orabsorbs infrared light. An opaque cover 322 has a larger circularopening with a diameter D2. The cover 322 does not transmit eithervisible or infrared light. It may be a thin-film coating which reflectsboth infrared and visible light or, alternatively, the cover may be partof an opaque holder for holding and positioning the substrate in theoptical system. This way, the multi-aperture system 320 acts as acircular aperture of diameter D2 for visible light and as a circularaperture of smaller diameter D1 for infrared light. The visible lightsystem has a larger aperture and faster f-number than the infrared lightsystem. Visible and infrared light passing the aperture system areprojected by the imaging optics 310 onto the image sensor 330.

The pixels of the image sensor may thus receive a wider-aperture opticalimage signal 352B for visible light, overlaying a secondnarrower-aperture optical image signal 354B for infrared light. Thewider-aperture visible image signal 352B will have a shorter DOF, whilethe narrower-aperture infrared image signal 354 will have a longer DOF.In FIG. 3B, the object 150B is located at the plane of focus N, so thatthe corresponding image 160B is in focus at the image sensor 330.

Objects 150 close to the plane of focus N of the lens are projected ontothe image sensor plane 330 with relatively small defocus blur. Objectsaway from the plane of focus N are projected onto image planes that arein front of or behind the image sensor 330. Thus, the image captured bythe image sensor 330 is blurred. Because the visible light 352B has afaster f-number than the infrared light 354B, the visible image willblur more quickly than the infrared image as the object 150 moves awayfrom the plane of focus N. This is shown by FIGS. 3A and 3C and by theblur diagrams at the right of each figure.

Most of FIG. 3B shows the propagation of rays from object 150B to theimage sensor 330. The righthand side of FIG. 3B also includes a blurdiagram 335, which shows the blurs resulting from imaging of visiblelight and of infrared light from an on-axis point 152 of the object. InFIG. 3B, the on-axis point 152 produces a visible blur 332B that isrelatively small and also produces an infrared blur 334B that is alsorelatively small. That is because, in FIG. 3B, the object is in focus.

FIGS. 3A and 3C show the effects of defocus. In FIG. 3A, the object 150Ais located to one side of the nominal plane of focus N. As a result, thecorresponding image 160A is formed at a location in front of the imagesensor 330. The light travels the additional distance to the imagesensor 330, thus producing larger blur spots than in FIG. 3B. Becausethe visible light 352A is a faster f-number, it diverges more quicklyand produces a larger blur spot 332A. The infrared light 354 is a slowerf-number, so it produces a blur spot 334A that is not much larger thanin FIG. 3B. If the f-number is slow enough, the infrared blur spot maybe assumed to be constant size across the range of depths that are ofinterest.

FIG. 3C shows the same effect, but in the opposite direction. Here, theobject 150C produces an image 160C that would fall behind the imagesensor 330. The image sensor 330 captures the light before it reachesthe actual image plane, resulting in blurring. The visible blur spot332C is larger due to the faster f-number. The infrared blur spot 334Cgrows more slowly with defocus, due to the slower f-number.

The DSP 180 may be configured to process and combine the captured colorand infrared images. Improvements in the DOF and the ISO speed providedby a multi-aperture imaging system are described in more detail in U.S.application Ser. No. 13/144,499, “Improving the depth of field in animaging system”; U.S. application Ser. No. 13/392,101, “Reducing noisein a color image”; U.S. application Ser. No. 13/579,568, “Processingmulti-aperture image data”; U.S. application Ser. No. 13/579,569,“Processing multi-aperture image data”; and U.S. application Ser. No.13/810,227, “Flash system for multi-aperture imaging.” All of theforegoing are incorporated by reference herein in their entirety.

In one example, the multi-aperture imaging system allows a simple mobilephone camera with a typical f-number of 2 (e.g. focal length of 3 mm anda diameter of 1.5 mm) to improve its DOF via a second aperture with af-number varying e.g. between 6 for a diameter of 0.5 mm up to 15 ormore for diameters equal to or less than 0.2 mm. The f-number is definedas the ratio of the focal length f and the effective diameter of theaperture. Preferable implementations include optical systems with anf-number for the visible aperture of approximately 2 to 4 for increasingthe sharpness of near objects, in combination with an f-number for theinfrared aperture of approximately 16 to 22 for increasing the sharpnessof distance objects.

The multi-aperture imaging system may also be used for generating depthinformation for the captured image. The DSP 180 of the multi-apertureimaging system may include at least one depth function, which typicallydepends on the parameters of the optical system and which in oneembodiment may be determined in advance by the manufacturer and storedin the memory of the camera for use in digital image processingfunctions.

If the multi-aperture imaging system is adjustable (e.g., a zoom lens),then the depth function typically will also include the dependence onthe adjustment. For example, a fixed lens camera may implement the depthfunction as a lookup table, and a zoom lens camera may have multiplelookup tables corresponding to different focal lengths, possiblyinterpolating between the lookup tables for intermediate focal lengths.Alternately, it may store a single lookup table for a specific focallength but use an algorithm to scale the lookup table for differentfocal lengths. A similar approach may be used for other types ofadjustments, such as an adjustable aperture. In various embodiments,when determining the distance or change of distance of an object fromthe camera, a lookup table or a formula provides an estimate of thedistance based on one or more of the following parameters: the blurkernel providing the best match between IR and RGB image data; thef-number or aperture size for the IR imaging; the f-number or aperturesize for the RGB imaging; and the focal length. In some imaging systems,the physical aperture is constrained in size, so that as the focallength of the lens changes, the f-number changes. In this case, thediameter of the aperture remains unchanged but the f-number changes. Theformula or lookup table could also take this effect into account.

In certain situations, it is desirable to control the relative size ofthe IR aperture and the RGB aperture. This may be desirable for variousreasons. For example, adjusting the relative size of the two aperturesmay be used to compensate for different lighting conditions. In somecases, it may be desirable to turn off the multi-aperture aspect. Asanother example, different ratios may be preferable for different objectdepths, or focal lengths or accuracy requirements. Having the ability toadjust the ratio of IR to RGB provides an additional degree of freedomin these situations.

FIG. 3D is a diagram illustrating adjustment of the relative sizes of anIR aperture 324 and visible aperture 322. In this diagram, the hashedannulus is a mechanical shutter 370. On the lefthand side, themechanical shutter 370 is fully open so that the visible aperture 322has maximum area. On the righthand side, the shutter 370 is stoppeddown, so that the visible aperture 322 has less area but the IR aperture324 is unchanged so that the ratio between visible and IR can beadjusted by adjusting the mechanical shutter 370. In FIG. 3E, the IRaperture 324 is located near the edge of the visible aperture 322.Stopping down the mechanical shutter 370 reduces the size (and changesthe shape) of the IR aperture 324 and the dual-aperture mode can beeliminated by stopping the shutter 370 to the point where the IRaperture 324 is entirely covered. Similar effects can be implemented byother mechanisms, such as adjusting electronic shuttering or exposuretime.

As described above in FIGS. 3A-3C, a scene may contain different objectslocated at different distances from the camera lens so that objectscloser to the focal plane of the camera will be sharper than objectsfurther away from the focal plane. A depth function may relate sharpnessinformation for different objects located in different areas of thescene to the depth or distance of those objects from the camera. In oneembodiment, a depth function is based on the sharpness of the colorimage components relative to the sharpness of the infrared imagecomponents.

Here, the sharpness parameter may relate to the circle of confusion,which corresponds to the blur spot diameter measured by the imagesensor. As described above in FIGS. 3A-3C, the blur spot diameterrepresenting the defocus blur is small (approaching zero) for objectsthat are in focus and grows larger when moving away to the foreground orbackground in object space. As long as the blur disk is smaller than themaximum acceptable circle of confusion, it is considered sufficientlysharp and part of the DOF range. From the known DOF formulas it followsthat there is a direct relation between the depth of an object, e.g. itsdistance s from the camera, and the amount of blur or sharpness of thecaptured image of that object. Furthermore, this direct relation isdifferent for the color image than it is for the infrared image, due tothe difference in apertures and f-numbers.

Hence, in a multi-aperture imaging system, the increase or decrease insharpness of the RGB components of a color image relative to thesharpness of the IR components in the infrared image is a function ofthe distance to the object. For example, if the lens is focused at 3meters, the sharpness of both the RGB components and the IR componentsmay be the same. In contrast, due to the small aperture used for theinfrared image for objects at a distance of 1 meter, the sharpness ofthe RGB components may be significantly less than those of the infraredcomponents. This dependence may be used to estimate the distances ofobjects from the camera.

In one approach, the imaging system is set to a large (“infinite”) focuspoint. That is, the imaging system is designed so that objects atinfinity are in focus. This point is referred to as the hyperfocaldistance H of the multi-aperture imaging system. The system may thendetermine the points in an image where the color and the infraredcomponents are equally sharp. These points in the image correspond toobjects that are in focus, which in this example means that they arelocated at a relatively large distance (typically the background) fromthe camera. For objects located away from the hyperfocal distance H(i.e., closer to the camera), the relative difference in sharpnessbetween the infrared components and the color components will change asa function of the distance s between the object and the lens.

The sharpness may be obtained empirically by measuring the sharpness(or, equivalently, the blurriness) for one or more test objects atdifferent distances s from the camera lens. It may also be calculatedbased on models of the imaging system. In one embodiment, sharpness ismeasured by the absolute value of the high-frequency infrared componentsin an image. In another approach, blurriness is measured by the blursize or point spread function (PSF) of the imaging system.

FIG. 4 is a plot of the blur spot sizes B_(vis) and B_(ir) of thevisible and infrared images, as a function of object distance s. FIG. 4shows that around the focal distance N, which in this example is thehyperfocal distance, the blur spots are the smallest. Away from thefocal distance N, the color components experience rapid blurring andrapid increase in the blur spot size B_(vis). In contrast, as a resultof the relatively small infrared aperture, the infrared components donot blur as quickly and, if the f-number is slow enough, the blur spotsize B_(ir) may be approximated as constant in size over the range ofdepths considered.

Now consider the object distance s_(x). At this object distance, theinfrared image is produced with a blur spot 410 and the visible image isproduced with a blur spot 420. Conversely, if the blur spot sizes wereknown, or the ratio of the blur spot sizes were know, this informationcould be used to estimate the object distance s_(x). Recall that theblur spot, also referred to as the point spread function, is the imageproduced by a single point source. If the object were a single pointsource, then the infrared image will be a blur spot of size 410 and thecorresponding visible image will be a blur spot of size 420.

FIG. 5 illustrates one approach to estimating the object distance basedon the color and infrared blur spots. FIG. 5 is a table of blur spot asa function of object distance s. For each object distance s_(k), thereis shown a corresponding IR blur spot (PSF_(ir)) and color blur spot(PSF_(vis)). The IR image I_(n) is the convolution of an ideal imageI_(ideal) with PSF_(ir), and the color image I_(vis) is the convolutionof the ideal image I_(ideal) with PSF_(vis).

I _(ir) =I _(ideal) *PSF _(ir)  (1)

I _(vis) =I _(ideal) *PSF _(vis)  (2)

where * is the convolution operator. Manipulating these two equationsyields

I _(vis) =I _(ir) *B  (3)

where B is a blur kernel that accounts for deblurring of the IR imagefollowed by blurring of the visible image. The blur kernels B can becalculated in advance or empirically measured as a function of objectdepth s, producing a table as shown in FIG. 5.

In FIG. 5, the blur kernel B is shown as similar in size to the visibleblur spot PSF_(vis). Under certain circumstances, the IR blur spotPSF_(ir) may be neglected or otherwise accounted for. For example, ifthe IR blur spot is small relative to the visible blur spot PSF_(vis),then neglecting the effect of the IR blur may be negligible. As anotherexample, if the IR blur spot does not vary significantly with objectdistance, then it may be neglected for purposes of calculating the blurkernel B, but may be accounted for by a systematic adjustment of theresults.

FIG. 6A is a diagram illustrating a method for producing an estimate s*of the object distance s using a bank 610 of blur kernels B_(k). Theinfrared image I_(n) is blurred by each of the blur kernels B_(k) in thebank. In this example, the blurring is accomplished by convolution,although faster approaches will be discussed below. This results inestimated visible images I*_(vis).

Each of these estimated images I*_(vis) is compared 620 to the actualvisible image I_(vis). In this example, the comparison is a sum squarederror e_(k) between the two images.

FIG. 6B is a graph of error e as a function of kernel number k for thearchitecture of FIG. 6A. Recall that each kernel number k corresponds toa specific object distance s. The error metrics e are processed 630 toyield an estimate s* of the object distance. In one approach, theminimum error e_(k) is identified, and the estimated object distance s*is the object depth s_(k) corresponding to the minimum error e_(k).Other approaches can also be used. For example, the functional pairs(s_(k),e_(k)) can be interpolated for the value of s that yields theminimum e.

The infrared image I_(ir) and visible image I_(vis) in FIG. 6A typicallyare not the entire captured images. Rather, the approach of FIG. 6A canbe applied to different windows within the image in order to estimatethe depth of the objects in the window. In this way, a depth map of theentire image can be produced.

The approach of FIG. 6A includes a convolution for each blur kernel. Ifthe window and blur kernel B_(k) are each large, the convolution can becomputationally expensive. The blur kernels B_(k) by definition willvary in size. For example, the smallest blur kernel may be 3×3 while thelargest may be 25×25 or larger. In order to accommodate the largest blurkernels, the window should be at least the same size as the largest blurkernel, which means a large window size is required for a bank thatincludes a large blur kernel. Furthermore, the same window should beused for all blur kernels in order to allow direct comparison of thecalculated error metrics. Therefore, if the bank includes a large blurkernel, a large window will be used for all blur kernels, which can leadto computationally expensive convolutions.

FIG. 7A is a diagram illustrating a variation of FIG. 6A that addressesthis issue. Rather than using a single bank of blur kernels, as in FIG.6A, the approach of FIG. 7A uses multiple banks 710 a-M of blur kernels.Each bank contains multiple blur kernels. However, each bank 710 isdown-sampled by a different down-sampling factor. For example, bank 710a may use the smallest blur kernels and the original images withoutdown-sampling, bank 710 b may use the next smallest set of kernels butwith down-sampling of 2×, and so on. In FIG. 7A, bank 710 m usesdown-sampling of mx. The visible image and the infrared image are alsodown-sampled by mx, as indicated by the boxes marked “/m”. Bank 710 muses blur kernels J to (J+K), each of which is also down-sampled by mx,as indicated by the “/m” in “*B_(J)/m”. Each bank 710 produces a result,for example an estimated object distance s_(m)* and these are combined730 into an overall depth estimate s*.

One advantage of this approach is that down-sampled blur kernels aresmaller and therefore require less computation for convolution and otheroperations. The table below shows a set of 9 blur kernels, ranging insize from 3×3 for blur kernel 1, to 25×25 for blur kernel 9. In theapproach of FIG. 6A, blur kernel 9 would be 25×25 with a correspondingnumber of multiply-accumulates used to implement convolution. Incontrast, in the table below, all blur kernels are down-sampled so thatno convolution uses a kernel larger than 5×5.

TABLE 1 Kernel Size of Down-sampling number (k) blur kernel factor 1 3 ×3 1x 2 5 × 5 2x 3 8 × 8 2x 4 11 × 11 3x 5 14 × 14 3x 6 17 × 17 4x 7 20 ×20 4x 8 23 × 23 5x 9 25 × 25 5x

FIGS. 7B and 7C are graphs of error as a function of blur kernel numberk for the architecture of FIG. 7A. If the down-sampling is performedwithout normalizing energies, then the error curve may exhibitdiscontinuities when transitioning from one bank to the next bank. FIG.7B shows an error curve using five banks Each piece of the curvecorresponds to one of the banks Each curve is continuous because thesame down-sampling factor is used for all blur kernels in that bank.However, the down-sampling factor changes from one bank to the next sothe different pieces of the curve may not align correctly. However, theminimum error can still be determined. In this example, curve 750 c isthe only curve that has a minimum within that curve. The other fourcurves are either monotonically increasing or monotonically decreasing.Therefore, the minimum error occurs within curve 750 c. Moresophisticated approaches may also be used. For example, differentialsacross the entire range of curves may be analyzed to predict the pointof minimum error. This approach can be used to avoid local minima, whichmay be caused by noise or other effects.

In FIG. 7B, the curves are shown as continuous within each bank.However, there may be a limited number of samples for each bank. FIG. 7Cis the same as FIG. 7B, except that there are only three samples foreach bank. In FIG. 7C, the dashed ovals identify each of the banks Eachof the banks can be classified as monotonically increasing,monotonically decreasing or containing an extremum. In this example,banks 750 a and 750 b are monotonically decreasing, bank 750 c containsan extremum, and banks 750 d and 750 e are monotonically increasing.Based on these classifications, the minimum error e occurs somewherewithin bank 750 c. Finer resolution sampling within bank 750 c can thenbe performed to more accurately locate the location of the minimumvalue.

In FIG. 7D, banks 750 a and 750 b are monotonically decreasing, andbanks 750 c and 750 d are monotonically increasing. There is no bankthat exhibits an internal extremum based on the samples shown. However,based on the gradients for the banks, the minimum lies in the rangecovered by banks 750 b and 750 c. In this case, another bank can beconstructed that spans the gap between banks 750 b and 750 c. That bankwill then have an internal minimum.

These figures effectively illustrate different sampling approaches tofind the extremum of the error function e(k). As another variation, theerror function e(k) may be coarsely sampled at first in order to narrowthe range of k where the minimum error e exists. Finer and finersampling may be used as the range is narrowed. Other sampling approachescan be used to find the value of kernel number k (and the correspondingobject distance) where the extremum of the error function e(k) occurs.

Down-sampling can be implemented in other ways. For example, the visibleimages may be down-sampled first. The blur kernels are then down-sampledto match the down-sampling of the visible images. The down-sampled blurkernels are applied to the full resolution IR images. The result is anintermediate form which retains the fill resolution of the IR image butthen is down-sampled to match the resolution of the down-sampled visibleimages. This method is not as efficient as fully down-sampling the IRbut is more efficient than not using down-sampling at all. This approachmay be beneficial to reduce computation while still maintaining a finerresolution.

Another aspect is that the approach of FIG. 6A depends on the content ofthe window. For example, a window for which the only object is a singlepoint source object (e.g., a window containing a single star surroundedentirely by black night sky) will yield a good result because that imageis a direct measure of the underlying point spread functions. Similarly,a window that contains the image of only an edge will also yield a goodresult because that image is a direct measure of the underlying pointspread functions albeit only along one direction. At the other extreme,a window that is constant and has no features will not yield anyestimate because every estimated visible image will also be a constantso there is no way to distinguish the different blur kernels. Otherimages may be somewhere between these extremes. Features will helpdistinguish the different blur kernels. Featureless areas will not andtypically will also add unwanted noise.

In one approach, the windows are selected to include edges. Edgeidentification can be accomplished using known algorithms. Onceidentified, edges preferably are processed to normalize variationsbetween the different captured images. FIG. 8 shows one example. In thisexample, the green component I_(gm) of the color image is the fastf-number image and the IR image I_(ir) is the slow f-number image. Theleft column of FIG. 8 shows processing of the green image while theright column shows processing of the IR image. The top row shows thesame edge appearing in both images. The object is not in focus so thatthe green edge is blurred relative to the IR edge. Also note that theedge has different phase in the two images. The green edge transitionsfrom high to low amplitude, while the IR edge transitions from low tohigh amplitude. FIG. 8 shows one approach to normalize these edges toallow comparisons using blur kernels as described above.

The second row of FIG. 8 shows both edges after differentiation 810. Theabsolute value 820 of the derivatives is then taken, yielding the thirdrow of FIG. 8. This effectively removes the phase mismatch between thetwo edges, yielding two phase matched edges. The two edges are thenscaled 830, resulting in the bottom row of FIG. 8. In this example, theIR image is binarized to take on only the values 0 or 1, and the greenimage is scaled in amplitude to have equal energy as the IR image. Theblur kernels are also scaled in amplitude so that, although a blurkernel might spread the energy in an image over a certain area, it doesnot increase or decrease the total energy. This then allows a directcomparison between the actual green edge and the estimated green edgescalculated by applying the blur kernels to the IR edge.

Note that the IR edge looks like a line source. This is not uncommonsince the IR point spread function is small and fairly constant over arange of depths, compared to the color point spread function. Alsorecall that in FIG. 6, the IR image is convolved with many differentblur kernels. The convolution can be simplified as follows. First, theIR edge is binarized, so that the IR image is a binary image taking ononly the values of 0 or 1. (In step 830 above, the color image is thenscaled in amplitude to have equal energy as the binary IR image).Convolution generally requires multiplies and adds. However, when theimage only takes values of 0 or 1, the multiplies are simplified.Multiplying by 0 yields all 0's so that pixels with 0 value can beignored. Multiplying by 1 yields the blur kernel so that no actualmultiplication is required. Rather, any pixel with 1 value causes anaccumulation of the blur kernel centered on that pixel.

FIGS. 9A-9E illustrate this concept. FIG. 9A shows a 4×4 window with abinarized edge, where the pixels are either 1 or 0. FIG. 9B shows a 3×3blur kernel to be convolved with the window. FIGS. 9C-9E showprogression of the convolution using only adds and no multiplies. Inthese figures, the lefthand side shows the binarized edge of FIG. 9A andthe righthand side shows progression of the convolution. In FIG. 9C,pixel 910 has been processed, meaning that the blur kernel centered onpixel 910 has been added to the moving sum on the right. In FIG. 9D, thenext pixel along the edge 911 has been processed. The blur kernelcentered on pixel 911 is added to the moving sum, which already containsthe effect of pixel 910. The result is shown on the right. Thiscontinues for all pixels with value of 1. FIG. 9E shows the final resultafter all four edge pixels have been processed. This is the estimatedgreen edge, which can then be compared to the actual green edge. If thetwo match well, then the blur kernel shown in FIG. 9B is the correctblur kernel for this window and can be used to estimate the objectdistance for this edge.

Edges in an image may be caused by a sharp transition within an object,for example the border between black and white squares on acheckerboard. In that case, the approach shown in FIG. 9 may beimplemented using entire blur kernels. However, edges may also be causedby occlusion, when a closer object partially blocks a more distantobject. In FIG. 10, the sign 1010 in the foreground partially blocks thehouse 1020 in the background. This creates an edge 1030 in the image.However, the left side of the edge is the sign 1010, which is at acloser object distance, and the right side of the edge is the house1020, which is at a farther object distance. The two different objectdistances correspond to different blur kernels. Applying a single blurkernel to the edge will not give good results, because when one side ismatched to the blur kernel, the other side will not be.

Single-sided blur kernels can be used instead. A single-sided blurkernel is half a blur kernel instead of an entire blur kernel. FIG. 11shows a set of eight single-sided blur kernels with different edgeorientations based on the 3×3 blur kernel of FIG. 9B. The full 3×3 blurkernel is reproduced in the center of FIG. 11. Note that differentsingle-sided blur kernels can be derived from the same full blur kernel,depending on the orientation of the edge. In FIG. 11, the solid line1110 represents the edge. These single-sided blur kernels can be appliedto binarized edges, as described above, to yield different depthestimates for each side of the edge.

FIG. 12 illustrates another aspect of the approach described above. Asdescribed above, a bank of blur kernels of varying sizes is used toestimate the object depth. Blur kernels effectively act as low passfilters. Larger blur kernels cause more blurring and therefore havelower cutoff frequencies compared to smaller blur kernels. FIG. 12 showsa generalized frequency response for a bank of blur kernels. Blur kernel1210A is the low pass filter with the lowest cutoff frequency in thebank, which corresponds to the blur kernel with the largest blur size.Blur kernel 1210B is the second largest blur kernel and so on to blurkernel 1210D, which has the highest cutoff frequency and smallest blursize. The IR image is blurred by each of these blur kernels, and theresults are compared to determine which blur kernel corresponds to theobject depth.

However, note that the blur kernels 1210A-D differ only within thefrequency range 1220. Outside this frequency range 1220, all of the blurkernels 1210A-D in the bank have the same behavior. Therefore, contentoutside the frequency range 1220 will not distinguish between thedifferent blur kernels 1210A-D. However, that content will add tobackground noise. Therefore, in one approach, frequency filtering isadded to reduce energy and noise from outside the frequency range 1220.In one approach, the original images are frequency filtered. In anotherapproach, the blur kernels may be frequency filtered versions. Thefrequency filtering may be low pass filtering to reduce frequencycontent above frequency 1220B, high pass filtering to reduce frequencycontent below frequency 1220A, or bandpass filtering to reduce both thelow frequency and high frequency content. The filtering may takedifferent forms and may be performed regardless of whether down-samplingis also used. When it is used, down-sampling is a type of low passfiltering.

The filtering may also be applied to less than or more than all the blurkernels in a bank. For example, a narrower bandpass filter may be usedif it is desired to distinguish only blur kernels 1210A and 1210B (i.e.,to determine the error gradient between blur kernels 1210A-1210B). Mostof the difference between those two blur kernels occurs in the frequencyband 1230, so a bandpass filter that primarily passes frequencies withinthat range and rejects frequencies outside that range will increase therelative signal available for distinguishing the two blur kernels 1210Aand 1210B.

Window sizes and locations preferably are selected based on the aboveconsiderations, and the window size may be selected independent of theblur kernel size. For example, window size may be selected to be largeenough to contain features such as edges, small enough to avoidinterfering features such as closely spaced parallel edges, andgenerally only large enough to allow processing of features since largerwindows will add more noise. The size of the blur kernel may be selectedto reduce computation (e.g., by down-sampling) and also possibly inorder to provide sufficient resolution for the depth estimation. As aresult, the window size may be different (typically, larger) than thesize of the blur kernels.

The number of windows and window locations may also be selected tocontain features such as edges, and to reduce computation. A judiciouschoice of windows can reduce power consumption by having fewer pixels topower up and to read out, which in turn can be used to increase theframe rate. A higher frame rate may be advantageous for many reasons,for example in enabling finer control of gesture tracking.

Embodiments of the invention may be implemented as a program product foruse with a computer system. The program(s) of the program product definefunctions of the embodiments (including the methods described herein)and can be contained on a variety of computer-readable storage media.Illustrative computer-readable storage media include, but are notlimited to: (i) non-writable storage media (e.g., read-only memorydevices within a computer such as CD-ROM disks readable by a CD-ROMdrive, flash memory, ROM chips or any type of solid-state non-volatilesemiconductor memory) on which information is permanently stored; and(ii) writable storage media (e.g., floppy disks within a diskette driveor hard-disk drive or any type of solid-state random-accesssemiconductor memory) on which alterable information is stored.

It is to be understood that any feature described in relation to any oneembodiment may be used alone, or in combination with other featuresdescribed, and may also be used in combination with one or more featuresof any other of the embodiments, or any combination of any other of theembodiments. Moreover, the invention is not limited to the embodimentsdescribed above, which may be varied within the scope of theaccompanying claims. For example, aspects of this technology have beendescribed with respect to different f-number images captured by amulti-aperture imaging system. However, these approaches are not limitedto multi-aperture imaging systems. They can also be used in othersystems that estimate depth based on differences in blurring, regardlessof whether a multi-aperture imaging system is used to capture theimages. For example, two images may be captured in time sequence, but atdifferent f-number settings. Another method is to capture two or moreimages of the same scene but with different focus settings, or to relyon differences in aberrations (e.g., chromatic aberrations) or otherphenomenon that cause the blurring of the two or more images to varydifferently as a function of depth so that these variations can be usedto estimate the depth.

1. A method for processing blurred image data, comprising: downsamplingfirst image data associated with a first image of an object, the firstimage captured using a first imaging system characterized by a firstpoint spread function; downsampling second image data associated with asecond image of the object, the second image captured using a secondimaging system characterized by a second point spread function thatvaries as a function of depth differently than the first point spreadfunction; for each blur kernel from a bank of down-sampled blur kernels,wherein each blur kernel corresponds to the first point spread functionrelative to the second point spread function at a different objectdepth, and the bank of blur kernels spans a range of object depths:blurring the down-sampled second image data with the down-sampled blurkernel; and comparing the blurred down-sampled second image data and thedown-sampled first image data; and generating depth information for theobject based on said comparisons.
 2. The method of claim 1, wherein: foreach blur kernel, comparing the blurred down-sampled second image dataand the down-sampled first image data comprises calculating an errorbetween the blurred down-sampled second image data and the down-sampledfirst image data; and generating depth information for the object basedon said comparisons comprises generating depth information based on adepth that corresponds to the blur kernel with a lowest calculatederror.
 3. The method of claim 2, wherein blurring the down-sampledsecond image data with the down-sampled blur kernel comprises: firstdeblurring the down-sampled second image data; and then blurring thedeblurred, down-sampled second image data with the down-sampled blurkernel.
 4. The method of claim 2, wherein blurring the down-sampledsecond image data with the down-sampled blur kernel comprises convolvingthe down-sampled second image data with the down-sampled blur kernel. 5.The method of claim 1, wherein: the first image data and the secondimage data each contain a same edge; for each blur kernel: blurring thedown-sampled second image data with the down-sampled blur kernelcomprises blurring the edge in the down-sampled second image data withthe down-sampled blur kernel; and comparing the blurred down-sampledsecond image data and the down-sampled first image data comprisescomparing the blurred edge in the down-sampled second image data and thesame edge in the down-sampled first image data.
 6. The method of claim5, wherein blurring the edge in the down-sampled second image datacomprises: binarizing the edge in the down-sampled second image data;and blurring the binarized edge with the down-sampled blur kernel. 7.The method of claim 5, wherein comparing the blurred edge in thedown-sampled second image data and the same edge in the down-sampledfirst image data comprises phase matching the edges in the first andsecond image data.
 8. The method of claim 5, wherein comparing theblurred edge in the down-sampled second image data and the same edge inthe down-sampled first image data comprises equating energy in the edgesin the first and second image data.
 9. The method of claim 1, whereinsaid blurring and comparing for each blur kernel and said generatingdepth information is performed for each of a plurality of banks ofdown-sampled blur kernels, each bank down-sampled by a differentdownsampling factor.
 10. The method of claim 9, wherein the plurality ofbanks span a contiguous range of object depths.
 11. The method of claim9, further comprising: classifying each bank as containing or notcontaining an extremum with respect to said comparison; and generatingdepth information for the object based on said classifications for thebank that contains the extremum.
 12. The method of claim 9, furthercomprising: classifying each bank as monotonically increasing,monotonically decreasing or containing an extremum with respect to saidcomparison; and generating depth information for the object based onsaid classifications for the banks.
 13. The method of claim 12, furthercomprising: if the classifications indicate an extremum occurs betweentwo banks, then creating an additional bank that spans between the twobanks.
 14. The method of claim 9, wherein each bank is down-sampled by adifferent integer downsampling factor.
 15. The method of claim 9,wherein a largest down-sampled blur kernel for each bank is a same sizefor all the banks.
 16. The method of claim 9, wherein all of the blurkernels are sufficiently down-sampled so that no down-sampled blurkernel is larger than 5×5.
 17. The method of claim 1, wherein the firstimaging system has a first f-number and the second imaging system has asecond f-number that is slower than the first f-number, wherein thef-number is defined as a ratio of a focal length and an effectivediameter of an aperture and whereby a size of the second point spreadfunction varies as a function of depth more slowly than a size of thefirst point spread function.
 18. The method of claim 17, furthercomprising: exposing an image sensor in a multi-aperture shared sensorimaging system to light from the object, using a first aperture with thefirst f-number to expose the first image and a second aperture with thesecond f-number to expose the second image.
 19. The method of claim 18,wherein the first aperture exposes the first image using light from afirst spectral band, and the second aperture exposes the second imageusing light from a different second spectral band.
 20. The method ofclaim 18, wherein the first aperture exposes the first image using lightfrom a visible spectrum, and the second aperture exposes the secondimage using light from an infrared spectrum.
 21. The method of claim 1,wherein the bank of down-sampled blur kernels comprises a bank ofdown-sampled single-sided blur kernels.
 22. The method of claim 1,further comprising: frequency filtering the second image data.
 23. Anon-transitory computer-readable storage medium storing executablecomputer program instructions for processing blurred image data, theinstructions executable by a processor and causing the processor toperform a method comprising: downsampling first image data associatedwith a first image of an object, the first image captured using a firstimaging system characterized by a first point spread function;downsampling second image data associated with a second image of theobject, the second image captured using a second imaging systemcharacterized by a second point spread function that varies as afunction of depth differently than the first point spread function; foreach blur kernel from a bank of down-sampled blur kernels, wherein eachblur kernel corresponds to the first point spread function of the firstimaging system relative to the second point spread function at adifferent object depth, and the bank of blur kernels spans a range ofobject depths: blurring the down-sampled second image data with thedown-sampled blur kernel; and comparing the blurred down-sampled secondimage data and the down-sampled first image data; and generating depthinformation for the object based on said comparisons.